KNN | model building

In this tutorial, we would build our Knn machine learning model.

We would be working with a dataset whose columns are not defined, they are just a few random things and we would select one column as our target variable.

Then we would like to read our file :

Now let us check whether the dataset has any missing values or not:

df.info ( )

Let us break the data set into two parts  

  1.  dependent variable ( target variable ) = y
  2. Independent variables =x

Now there is a term comes “ scaling “ which should apply in Knn method.

Scaling:

Any numerical data has two features in its “MAGNITUDE” and “UNITS”. When we plot these data there range could be anything, the scaler is something that transforms any value to fixed ranged value.

We have to import standard scaler from sklearn library

We have to scale our “x” dataset, however “y” dataset is already a boolean value.

Now we have to create a dataframe of these scaled values, we directly cannot pass this function in a dataframe we have to transform these scaled values.

Now the “X” values are transformed, and now we have to create a data frame which consists of these scaled values instead of actual values.

Now the next step is similar to that of linear and logistic regression.

Let us split the data into train and text

Now we have to import knn model from sklearn library

Now let us fit these data in the KNN model

The value we have passed in the KNeighborsClassifier is k. Here the value of k is 2. Here we are taking this value very randomly, we further will explore the elbow method and come back to change this value. For time being our k value is 2

Now let us make a prediction and evaluate our model.

Predicting the target variable:

Evaluating the model:

As we can see that our model is really not performing well, it has an average type of accuracy nearly 75%. It means we really guesses a very wrong value of k. Now let us select the best valur for k.

for a in range(1,100):

    Knn = KNeighborsClassifier(n_neighbors=a)

    Knn.fit(x_train,y_train)

    pred_a = Knn.predict(x_test)

    new_k.append(np.mean(pred_a != y_test))

This is the function to check the best value of k, and when we plot the function we get a curve known as elbow curve.

Now let us plot the elbow curve. 

We have considered the value of k=2, which shows a huge error, now let us check the least value of k for which there us least error rate.  ( k= 23 ) nearly 

As we can see that our model has gain a good amount of accuracy rate.

In the upcoming tutorial we would learn about

  • Decision trees 
  • Random forests
Spread knowledge

Leave a Comment

Your email address will not be published. Required fields are marked *